Transcriptomic annotation of the Chungtien schizothoracin (Ptychobarbus chungtienensis) using Iso-seq and RNA-seq data

The Chungtien schizothoracin (Ptychobarbus chungtienensis), an endangered fish species endemic to the Zhongdian Plateau, remains underexplored in terms of transcriptomic sequencing. This investigation used tissues from five distinct organs (heart, liver, spleen, kidney, and brain) of the Chungtien schizothoracin for PacBio Iso-seq and RNA-seq analyses, yielding a repertoire of 16,598 full-length transcripts spanning lengths from 363 bp to 7,157 bp. Gene family clustering and phylogenetic analysis encompassed a comprehensive set of 13 fish species, all of which were cyprinids, including the zebrafish and the examined species Ptychobarbus chungtienensis. Moreover, the identification of long non-coding RNAs (lncRNAs) and coding sequences was accomplished across all five tissues. Comprehensive analyses of gene expression profiles and differentially expressed genes among the above five tissues were performed. In summary, the obtained full-length transcripts and detailed gene expression profiles of the Chungtien schizothoracin tissues furnish crucial expression data and genetic sequences, laying the groundwork for future investigations and fostering a holistic comprehension of the adaptive mechanisms inherent in the Chungtien schizothoracin under various conditions.

adaptations, such as responses to high altitude environments, temperature fluctuations, oxygen levels, or food availability.Therefore, transcriptome sequencing emerges as a potent tool poised to swiftly enrich our comprehension of the functional genomics inherent to the Chungtien schizothoracin.
In this investigation, a diverse array of tissues, encompassing the heart, liver, spleen, kidney, and brain, extracted from the Chungtien schizothoracin, served as the substrate for integrated PacBio Iso-seq and RNA-seq analyses.The outcome yielded a substantial collection of 16,598 full-length transcripts, exhibiting a considerable length spectrum from 363 bp to 7,157 bp.Through meticulous bioinformatics scrutiny, it was discerned that 15,783 and 14,064 transcript sequences found alignment within the NCBI non-redundant protein sequences (Nr) and Swissprot database, respectively.Further annotation unveiled 14,690, 13,609, and 11,406 transcripts affiliating with the Kyoto Encyclopedia of Genes and Genomes (KEGG), euKaryotic Ortholog Groups (KOG), and Gene Ontology (GO) databases, respectively.Extending beyond mere coding sequences, the exploration also encompassed the identification of long-chain non-coding RNAs (lncRNAs) across all the sampled tissues from the Chungtien schizothoracin.The culmination of these findings, encapsulated in the full-length transcripts and gene expression profiles, posits itself as a trove of genetic information.This transcription repository not only paves the way for comprehensive functional genomic inquiries but also lays the foundation for an enriched understanding of the adaptive mechanisms governing the survival of the Chungtien schizothoracin.

Collection of samples and preparation of RNA.
A female sampling individual was officially fished from the Aquatic Biological Resources Survey conducted by the Diqing Prefecture Institute of Animal Husbandry and Veterinary Science (submitted from the Luoji Section of Luoji River in Shangri-La City).It was 4-6 months old and a quintet of tissues (heart, liver, spleen, kidney, and brain) underwent prompt cryopreservation in liquid nitrogen.The extracted RNA sample is then added to DNaseI buffer, followed by the addition of an appropriate amount of DNaseI enzyme.The RNA sample is subjected to DNaseI digestion at 37 °C for 30 minutes.Subsequently, DNaseI stop solution is added to ensure complete removal of DNaseI enzyme.Finally, RNA purification and detection are carried out to ensure that the RNA sample does not contain any residual DNA.TRIzol reagent (Invitrogen) facilitated individual RNA isolation from each tissue following the manufacturer's protocol.Initial assessment of RNA concentration and purity employed Nanodrop 2000, while agarose gel electrophoresis scrutinized genomic contamination, purity, and RNA integrity.Subsequently, the RIN (RNA Integrity Number) value was meticulously determined using the Agilent 2100 platform.
PacBio library construction and sequencing.For the construction of the PacBio sequencing library, equal proportions of qualified RNA from the five tissues was pooled together for library construction at Wuhan Frasergen Gene Biotechnology Co., Ltd (Wuhan, China).Eukaryotic mRNA 3 'terminal has a poly-A tail structure, Primers with Oligo dT were used to pair A-T bases with poly-A as primers for reverse synthesis of cDNA, and primers were added to the end of full-length cDNA of reverse synthesis.The full-length cDNA was amplified by PCR, and the product was purified by PB magnetic beads to remove some small fragments of cDNA less than 1 kb.Repair the end and connect the SMRT dumbbell connector.The unconnected fragments were digested by exonuclease and purified by PB magnetic beads to obtain the sequencing library.After the library construction was completed, Qubit 3.0 was used for accurate quantification, and Agilent 2100 was used to detect the library size.Machine sequencing could be carried out only after the library size met expectations.The raw data retrieved from the sequencer underwent filtration and processing using SMRTlink 10.0 software, with the parameter-min-Length = 50 configured to exclude sequences below 50 base pairs.Post this filtering process, the obtained data were considered valid.
mRNA Library construction and sequencing.mRNA from the five tissues of the Chungtien schizothoracin was accomplished using Oligo(dT) magnetic beads.Total RNA was extracted using the Trizol (Invitrogen, CA, USA), RNA purity and integrity was monitored by NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and a Bioanalyzer 2100 system (Agilent Technologies, CA, USA).RNA contamination was assessed by 1.5% agarose gel.Adhering to the manufacturer's instructions, mRNA purification was executed, and libraries were constructed utilizing the V AHTS Universal V6 RNA-seq Library Kit for MGI (Vazyme, Nanjing, China).The Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and the Agilent 2100 system (Agilent Technologies, CA, USA) were employed to evaluate the quality and size of the libraries.Subsequently, sequencing was performed on the MGI-SEQ 2000 platform at Wuhan Frasergen Gene Biotechnology Co., Ltd (Wuhan, China).
PacBio ISo-seq data processing.As delineated in Table 1, PacBio ISO-seq yielded a comprehensive 20,724,543 subreads.The ccs tool was deployed to derive Circular Consensus Sequences (CCS) from subreads, employing the-all parameter to retain all quality CCS, culminating in 396,323 CCS.Subsequent refinement, conducted through the limma tool, entailed the removal of chimeric and polyA-tailed sequences, resulting in a robust set of 261,878 Full-Length Non-Chimeric (FLNC) reads, boasting an average length of 2,133 bp.IsoSeq 3 played a pivotal role in clustering and deduplicating reads, forming consensus sequences by aggregating akin sequences and optimizing transcripts.These optimized transcripts underwent correction and deduplication through LoRDEC, ultimately yielding a collection of 16,598 high-quality consensus isoforms, exhibiting an accuracy rate surpassing 99% and an average length of 2,207.3bp (Fig. 1A).The non-redundant full-length transcript dataset exhibited exceptional integrity, serving as the foundation for subsequent analyses.1B).Furthermore, ISO isoforms underwent mapping to the KOG database for rigorous functional classification (Fig. 1C).The analysis of GO ontology and KEGG pathways (Fig. 2) was conducted using TBtools-II 17 , facilitating the identification of annotated sequences and the exploration of active biological pathways in the Chungtien schizothoracin.For instance, KEGG analysis unveiled that Unigenes predominantly partake in six pathways, metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems, human diseases.Furthermore, we utilized the TransDecoder software to prognosticate and scrutinize the coding sequences of the Chungtien schizothoracin.
For clustering analysis based on sequence similarity (Fig. 3A), the protein sequences of 14 species, as later enumerated, were instrumental.All alignments adhered to an E-value threshold of <1e−5.As depicted in Fig. 3A, a total of 14,910 genes were unveiled in the Chungtien schizothoracin, showcasing multiple homologous genes clustered among the 13 fish species, comprising 12 cyprinid species and the zebrafish.Subsequent to this, a phylogenetic tree was meticulously constructed deploying OrthoFinder with the maximum likelihood method (Fig. 3B).The outcomes elucidated that the Chungtien schizothoracin shares close affinities with Onychostoma macrolepis, a multi-scaled scraper in the Cyprinidae family.In a holistic context, the phylogenetic relationships impeccably align with the classification and evolutionary standing of these species.
Additionally, we predicted the coding sequence (CDS) of all full-length transcripts.As illustrated in Fig. 4A, the average lengths of CDS in the Chungtien schizothoracin are primarily distributed around 1,152 bp.The longest and shortest lengths of CDS are 5,805 bp and 279 bp, respectively.Long non-coding RNAs (lncRNAs) play a crucial role in the regulation of growth and development in many fish species.Here, potential lncRNAs were predicted using CPC2, CPAT, PLEK, and CNCI databases.As depicted in Fig. 4B, the Venn diagram analysis of predicted lncRNAs from these four software packages revealed a total of 773 identified lncRNAs.

Identification of differentially expressed genes in
Cluster analysis of gene families and phylogenetic analysis.Performing cluster analysis based on sequence similarity, protein sequences from 13 fish species, comprising 12 cyprinid species and the zebrafish (C.auratus, C. carassius, C. carpio, C. gibelio, C. idella, D. rerio, L. rohita, O. macrolepis, P. chungtienensis, P. tetrazona, S. anshuiensis, S. grahami, S. rhinocerous) were utilized.The analysis encompassed the removal of redundancy and alternative splicing, retaining only the longest transcripts.Multiple sequence alignment was carried out with the "-S diamond" option, and ultimately, the analysis incorporated the "-M msa -T iqtree" parameters to construct a rooted STAG (Species Tree Inference from All Genes) species phylogenetic tree using the maximum likelihood method 22 .

Data Records
All RNA-seq raw reads of the Chungtien schizothoracin were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information under accession number SRR28201811, SRR28201812, SRR28201813, SRR28201814, SRR28201815, SRR28201816 [23][24][25][26][27][28] .Also, all raw full-length Iso-seq reads of the Chungtien schizothoracin were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information under accession number SRR28340047 29 .

technical Validation
To elucidate the global mRNA expression patterns in the Chungtien schizothoracin, samples from five tissues (heart, liver, spleen, kidney, and brain) were subjected to sequencing.Evaluation of RNA-seq clean reads quality was conducted using FastQC, revealing an average mapping rate (clean reads/original reads) of 98.2%.The Fig. 3 Cluster analysis of gene families and phylogenetic analysis.Cluster analysis of gene families of 13 fish species and phylogenetic analysis of 13 fish species though a rooted STAG species phylogenetic tree using the maximum likelihood method.average Q20 and Q30 were 96.4% and 88.6%, respectively.The GC content distribution in the Chungtien schizothoracin tissue samples exhibited a normal pattern, indicating the absence of sequencing data contamination.Given the lack of a reference genome for the Chungtien schizothoracin, alignment of clean reads from the five tissue samples was performed against the isoforms generated by PacBio ISO-seq.Subsequent correlation analysis (Fig. 5A) and examination of gene expression profiles across the five tissues (Fig. 5B) provided insights into the mRNA expression dynamics of the Chungtien schizothoracin.

Function
annotation of unigenes.Functional Annotation of Unigenes in the Chungtien schizothoracin ensued through a meticulous comparison of identified Unigenes against seven databases, namely NR, NT, Swiss-Prot, Pfam, GO, KEGG, and COG.For the NT database analysis, the BLAST software with an
the different tissues.Correlation analysis and calculation of RPKM (Reads Per kb per Million reads) values were employed to ascertain the gene expression levels across the 5 tissues of the Chungtien schizothoracin.Differentially expressed genes (DEGs) were identified

Fig. 2
Fig. 2 GO ontology (A) and KEGG pathways (B) analysis for the annotated sequences of the Chungtien schizothoracin.
All fish handling and experimental procedures in this study were approved by the Animal Care and Use Committee of Yunnan Agricultural University (Approval Code: 202212001, Approval Data: 1 December 2022).
ethics statement.

Table 1 .
Statistic of ISO-sequencing in the Chungtien schizothoracin.